MySQL 5.7+
最近线上某个业务的单表关键字模糊查询一直比较缓慢,该业务需要通过关键字对多个字段同时进行模糊查询,故根据 MyBatis 动态 SQL 按如下实现方式实现:
1 | <if test="keywords != null"> |
通过分析索引和数据,发现这些列都已经加了索引,可能导致的原因就是某些列存在值为 null 的情形。故将原因初步定位于 null 值列对 索引 的影响。
在维基百科中搜索到一篇资料:NULL(SQL))
里面有这么一段话:
Effect on index operation
Some SQL products do not index keys containing NULLs. For instance, PostgreSQL versions prior to 8.3 did not, with the documentation for a B-tree index stating that[21]
B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < ≤ = ≥ >
Constructs equivalent to combinations of these operators, such as BETWEEN and IN, can also be implemented with a B-tree index search. (But note that IS NULL is not equivalent to = and is not indexable.)
In cases where the index enforces uniqueness, NULLs are excluded from the index and uniqueness is not enforced between NULLs. Again, quoting from the PostgreSQL documentation:[22]
When an index is declared unique, multiple table rows with equal indexed values will not be allowed. Nulls are not considered equal. A multicolumn unique index will only reject cases where all of the indexed columns are equal in two rows.
This is consistent with the SQL:2003-defined behavior of scalar Null comparisons.
Another method of indexing Nulls involves handling them as not distinct in accordance with the SQL:2003-defined behavior. For example, Microsoft SQL Server documentation states the following:[23]
For indexing purposes, NULLs compare as equal. Therefore, a unique index, or UNIQUE constraint, cannot be created if the keys are NULL in more than one row. Select columns that are defined as NOT NULL when columns for a unique index or unique constraint are chosen.
Both of these indexing strategies are consistent with the SQL:2003-defined behavior of Nulls. Because indexing methodologies are not explicitly defined by the SQL:2003 standard, indexing strategies for Nulls are left entirely to the vendors to design and implement.
主要意思就是索引一般使用 B-Tree 实现,而 B-Tree 是为了处理 等于、范围查找、排序等操作,对 null 没法使用 = 操作
在 高性能 MySQL 第二版 page 64 中也提到:
要尽量避免 NULL
要尽可能地把字段定义为 NOT NULL。即使应用程序无须保存 NULL(没有值),也有许多表包含了可空列(Nullable Column),这仅仅是因为它为默认选项。除非真的要保存 NULL,否则就把列定义为 NOT NULL。
MySQL难以优化引用了可空列的查询,它会使索引、索引统计和值更加复杂。可空列需要更多的储存空间,还需要在MySQL内部进行特殊处理。当可空列被索引的时候,每条记录都需要一个额外的字节,还可能导致 MyISAM 中固定大小的索引(例如一个整数列上的索引)变成可变大小的索引。
即使要在表中储存「没有值」的字段,还是有可能不使用 NULL 的。考虑使用 0、特殊值或空字符串来代替它。
把 NULL 列改为 NOT NULL 带来的性能提升很小,所以除非确定它引入了问题,否则就不要把它当作优先的优化措施。然后,如果计划对列进行索引,就要尽量避免把它设置为可空。
现采用如下优化方式:
使用 CONCAT_WS 函数将所有的字段连接以后在模糊查询
1 | <if test="keywords != null"> |
但是如此就不能使用索引了,此时可以利用 MySQL 5.7+ 虚拟列的功能对 CONCAT_WS 的字段创建一个虚拟列:
1 | ALTER TABLE purchase_supply_catalog_using_info |
并对虚拟列创建索引:
1 | ALTER TABLE purchase_supply_catalog_using_info ADD INDEX idx_virtual_keywords ( virtual_keywords ); |
至此,通过实践积累,了解了 SQL 中 NULL 的特殊性, CONCAT 和 CONCAT_WS 函数的差异, 虚拟列 和 虚拟列索引 等相关知识。